Goto

Collaborating Authors

 Westminster


Easy ways to make calls when vision is a challenge

FOX News

The upgraded Magnifier app stands out with iOS 18. Technology can be wonderfully convenient and provide a great deal of entertainment, but it can also be a great way to improve your everyday life, too. For those who experience visual challenges, a variety of apps and features can help you. That's why we love this question about apps and features that can help visually challenged loved ones: "I am not tech savvy. I need to know if there is an app that I can download on a phone, that will allow my mother to tell the app, without needing internet services, who she wants to make a phone call to? She's losing her eyesight and can no longer see the numbers on her phone. She's 88 years old and doesn't own a computer and has limited income," writes "Sheryl" of Westminster, Colorado.


RoutePlacer: An End-to-End Routability-Aware Placer with Graph Neural Network

Hou, Yunbo, Ye, Haoran, Zhang, Yingxue, Xu, Siyuan, Song, Guojie

arXiv.org Artificial Intelligence

Placement is a critical and challenging step of modern chip design, with routability being an essential indicator of placement quality. Current routability-oriented placers typically apply an iterative two-stage approach, wherein the first stage generates a placement solution, and the second stage provides non-differentiable routing results to heuristically improve the solution quality. This method hinders jointly optimizing the routability aspect during placement. To address this problem, this work introduces RoutePlacer, an end-to-end routability-aware placement method. It trains RouteGNN, a customized graph neural network, to efficiently and accurately predict routability by capturing and fusing geometric and topological representations of placements. Well-trained RouteGNN then serves as a differentiable approximation of routability, enabling end-to-end gradient-based routability optimization. In addition, RouteGNN can improve two-stage placers as a plug-and-play alternative to external routers. Our experiments on DREAMPlace, an open-source AI4EDA platform, show that RoutePlacer can reduce Total Overflow by up to 16% while maintaining routed wirelength, compared to the state-of-the-art; integrating RouteGNN within two-stage placers leads to a 44% reduction in Total Overflow without compromising wirelength.


NicePIM: Design Space Exploration for Processing-In-Memory DNN Accelerators with 3D-Stacked-DRAM

Wang, Junpeng, Ge, Mengke, Ding, Bo, Xu, Qi, Chen, Song, Kang, Yi

arXiv.org Artificial Intelligence

With the widespread use of deep neural networks(DNNs) in intelligent systems, DNN accelerators with high performance and energy efficiency are greatly demanded. As one of the feasible processing-in-memory(PIM) architectures, 3D-stacked-DRAM-based PIM(DRAM-PIM) architecture enables large-capacity memory and low-cost memory access, which is a promising solution for DNN accelerators with better performance and energy efficiency. However, the low-cost characteristics of stacked DRAM and the distributed manner of memory access and data storing require us to rebalance the hardware design and DNN mapping. In this paper, we propose NicePIM to efficiently explore the design space of hardware architecture and DNN mapping of DRAM-PIM accelerators, which consists of three key components: PIM-Tuner, PIM-Mapper and Data-Scheduler. PIM-Tuner optimizes the hardware configurations leveraging a DNN model for classifying area-compliant architectures and a deep kernel learning model for identifying better hardware parameters. PIM-Mapper explores a variety of DNN mapping configurations, including parallelism between branches of DNN, DNN layer partitioning, DRAM capacity allocation and data layout pattern in DRAM to generate high-hardware-utilization DNN mapping schemes for various hardware configurations. The Data-Scheduler employs an integer-linear-programming-based data scheduling algorithm to alleviate the inter-PIM-node communication overhead of data-sharing brought by DNN layer partitioning. Experimental results demonstrate that NicePIM can optimize hardware configurations for DRAM-PIM systems effectively and can generate high-quality DNN mapping schemes with latency and energy cost reduced by 37% and 28% on average respectively compared to the baseline method.


A Memory Model for Question Answering from Streaming Data Supported by Rehearsal and Anticipation of Coreference Information

Araujo, Vladimir, Soto, Alvaro, Moens, Marie-Francine

arXiv.org Artificial Intelligence

Existing question answering methods often assume that the input content (e.g., documents or videos) is always accessible to solve the task. Alternatively, memory networks were introduced to mimic the human process of incremental comprehension and compression of the information in a fixed-capacity memory. However, these models only learn how to maintain memory by backpropagating errors in the answers through the entire network. Instead, it has been suggested that humans have effective mechanisms to boost their memorization capacities, such as rehearsal and anticipation. Drawing inspiration from these, we propose a memory model that performs rehearsal and anticipation while processing inputs to memorize important information for solving question answering tasks from streaming data. The proposed mechanisms are applied self-supervised during training through masked modeling tasks focused on coreference information. We validate our model on a short-sequence (bAbI) dataset as well as large-sequence textual (NarrativeQA) and video (ActivityNet-QA) question answering datasets, where it achieves substantial improvements over previous memory network approaches. Furthermore, our ablation study confirms the proposed mechanisms' importance for memory models.


STI: Turbocharge NLP Inference at the Edge via Elastic Pipelining

Guo, Liwei, Choe, Wonkyo, Lin, Felix Xiaozhu

arXiv.org Artificial Intelligence

Natural Language Processing (NLP) inference is seeing increasing adoption by mobile applications, where on-device inference is desirable for crucially preserving user data privacy and avoiding network roundtrips. Yet, the unprecedented size of an NLP model stresses both latency and memory, creating a tension between the two key resources of a mobile device. To meet a target latency, holding the whole model in memory launches execution as soon as possible but increases one app's memory footprints by several times, limiting its benefits to only a few inferences before being recycled by mobile memory management. On the other hand, loading the model from storage on demand incurs IO as long as a few seconds, far exceeding the delay range satisfying to a user; pipelining layerwise model loading and execution does not hide IO either, due to the high skewness between IO and computation delays. To this end, we propose Speedy Transformer Inference (STI). Built on the key idea of maximizing IO/compute resource utilization on the most important parts of a model, STI reconciles the latency v.s. memory tension via two novel techniques. First, model sharding. STI manages model parameters as independently tunable shards, and profiles their importance to accuracy. Second, elastic pipeline planning with a preload buffer. STI instantiates an IO/compute pipeline and uses a small buffer for preload shards to bootstrap execution without stalling at early stages; it judiciously selects, tunes, and assembles shards per their importance for resource-elastic execution, maximizing inference accuracy. Atop two commodity SoCs, we build STI and evaluate it against a wide range of NLP tasks, under a practical range of target latencies, and on both CPU and GPU. We demonstrate that STI delivers high accuracies with 1-2 orders of magnitude lower memory, outperforming competitive baselines.


Neural Network Quantization for Efficient Inference: A Survey

Weng, Olivia

arXiv.org Artificial Intelligence

As neural networks have become more powerful, there has been a rising desire to deploy them in the real world; however, the power and accuracy of neural networks is largely due to their depth and complexity, making them difficult to deploy, especially in resource-constrained devices. Neural network quantization has recently arisen to meet this demand of reducing the size and complexity of neural networks by reducing the precision of a network. With smaller and simpler networks, it becomes possible to run neural networks within the constraints of their target hardware. This paper surveys the many neural network quantization techniques that have been developed in the last decade. Based on this survey and comparison of neural network quantization techniques, we propose future directions of research in the area.


Autonomous vehicles to leverage HD maps from space

#artificialintelligence

It may not be common knowledge, but the automotive industry is in deep discussions to find out how aeronautic technology can benefit the next generation of road vehicles. More specifically, satellite imaging firms are using their expertise to assist the creation of high-definition (HD) maps, which can optimise autonomous vehicle (AV) navigation, ride-share operations and last-mile delivery services. Maxar is based in Westminster, Colorado. From here, it runs a global business you've never heard of, but will almost certainly have used. Its constellation of satellites circles the earth once every 90 minutes--that's 16 revolutions per day--on what is called a sun-synchronous orbit.


Satellite images and machine learning can identify remote communities to facilitate access to health services

#artificialintelligence

Community health systems operating in remote areas require accurate information about where people live to efficiently provide services across large regions. We sought to determine whether a machine learning analyses of satellite imagery can be used to map remote communities to facilitate service delivery and planning. We developed a method for mapping communities using a deep learning approach that excels at detecting objects within images. We trained an algorithm to detect individual buildings, then examined building clusters to identify groupings suggestive of communities. The approach was validated in southeastern Liberia, by comparing algorithmically generated results with community location data collected manually by enumerators and community health workers. The deep learning approach achieved 86.47% positive predictive value and 79.49% sensitivity with respect to individual building detection. The approach identified 75.67% (n 451) of communities registered through the community enumeration process, and identified an additional 167 potential communities not previously registered. Several instances of false positives and false negatives were identified.


Unifying System Health Management and Automated Decision Making

Balaban, Edward, Johnson, Stephen B., Kochenderfer, Mykel J.

Journal of Artificial Intelligence Research

Health management of complex dynamic systems has evolved from simple automated alarms into a subfield of artificial intelligence with techniques for analyzing off-nominal conditions and generating responses. This evolution took place largely apart from the development of automated system control, planning, and scheduling (generally referred to in this work as decision making). While there have been efforts to establish an information exchange between system health management and decision making, successful practical implementations of integrated architectures remain limited. This article proposes that rather than being treated as connected yet distinct entities, system health management and decision making should be unified in their formulations. Enabled by advances in modeling and algorithms, we believe that a unified approach will increase systems' resilience to faults and improve their effectiveness. We overview the prevalent system health management methodology, illustrate its limitations through numerical examples, and describe a proposed unified approach. We then show how typical system health management concepts are accommodated in the proposed approach without loss of functionality or generality. A computational complexity analysis of the unified approach is also provided.


PABO: Pseudo Agent-Based Multi-Objective Bayesian Hyperparameter Optimization for Efficient Neural Accelerator Design

Parsa, Maryam, Ankit, Aayush, Ziabari, Amirkoushyar, Roy, Kaushik

arXiv.org Machine Learning

The ever increasing computational cost of Deep Neural Networks (DNN) and the demand for energy efficient hardware for DNN acceleration has made accuracy and hardware cost co-optimization for DNNs tremendously important, especially for edge devices. Owing to the large parameter space and cost of evaluating each parameter in the search space, manually tuning of DNN hyperparameters is impractical. Automatic joint DNN and hardware hyperparameter optimization is indispensable for such problems. Bayesian optimization-based approaches have shown promising results for hyperparameter optimization of DNNs. However, most of these techniques have been developed without considering the underlying hardware, thereby leading to inefficient designs. Further, the few works that perform joint optimization are not generalizable and mainly focus on CMOS-based architectures. In this work, we present a novel pseudo agent-based multi-objective hyperparameter optimization (PABO) for maximizing the DNN performance while obtaining low hardware cost. Compared to the existing methods, our work poses a theoretically different approach for joint optimization of accuracy and hardware cost and focuses on memristive crossbar-based accelerators. PABO uses a supervisor agent to establish connections between the posterior Gaussian distribution models of network accuracy and hardware cost requirements. The agent reduces the mathematical complexity of the co-optimization problem by removing unnecessary computations and updates of acquisition functions, thereby achieving significant speed-ups for the optimization procedure. PABO outputs a Pareto frontier that underscores the trade-offs between designing high-accuracy and hardware efficiency. Our results demonstrate a superior performance compared to the state-of-the-art methods both in terms of accuracy and computational speed (~100x speed up).